Skip to content

Conversation

@valadaptive
Copy link
Contributor

I meant to use this method to implement the from_slice method, but apparently used load_array instead.

@LaurenzV
Copy link
Collaborator

Perhaps this will fix our issues with #171!

@Shnatsel
Copy link
Contributor

This fixed much of my issues with QuState/PhastFT#58 !

There's still a performance gap vs wide but this closes much of it!

@LaurenzV
Copy link
Collaborator

How much is left?

@Shnatsel
Copy link
Contributor

Shnatsel commented Jan 22, 2026

fearless_simd is 7% to 13% slower on Apple M4 depending on the benchmark (based on a quick run, not a full run; I can do a full run with more tests later).

On x86 (Zen4) it ranges from on par to 6% worse but that's not perfectly apples-to-apples since Zen4 has AVX-512 (emulated, double-pumped) that wide can use but fearless_simd cannot so I'm not too worried about that.

@valadaptive valadaptive added this pull request to the merge queue Jan 23, 2026
Merged via the queue into linebender:main with commit 744661d Jan 23, 2026
18 checks passed
@valadaptive valadaptive deleted the load-array-ref branch January 23, 2026 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants